fix(graph): cap /banks/{bank_id}/graph response size to keep Control Plane parseable on dense banks#1605
Conversation
… Node-parseable The graph endpoint accepts ``limit`` and the Control Plane UI defaults to 1000. Edge count scales quadratically with node count for densely-linked banks (memory_units cross-linked via memory_links). Empirical measurement on a 14k-memory bank: limit=200 → 23 MiB, limit=500 → 147 MiB, limit=1000 → 596 MiB. Next.js deserializes the response as a single JS string via response.json(); V8 caps string length at 0x1fffffe8 (~512 MiB). Banks past that cap error out with "Cannot create a string longer than 0x1fffffe8 characters" and the UI shows "Failed to fetch graph data" instead of any graph. Server-side clamp at 200 keeps the API usable for high-volume banks while preserving response payload semantics for smaller ones (where 200 is well under typical graph sizes anyway). Graph viz tools are unusable past ~200 nodes regardless, so this isn't a UX regression. Real fix is paginated/streamed graph loading in the UI; this is a guard to prevent the existing UI from breaking on large banks until that lands.
There was a problem hiding this comment.
Pull request overview
Caps the server-side limit parameter for GET /v1/default/banks/{bank_id}/graph to prevent extremely large graph responses that can exceed V8/Next.js parsing limits on dense banks, keeping the Control Plane UI usable at scale.
Changes:
- Clamp
limitinapi_graphto the range[1, 200]to bound response size. - Add inline rationale/comments explaining why the cap is needed (dense-link quadratic edge growth + V8 string cap).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review on vectorize-io#1605: 1. Move limit signature default from 1000 to 200 via Query(...) so OpenAPI docs match runtime behavior. ge=1 added for explicit lower bound. The silent clamp at 200 is preserved (no 422) so existing UI/SDK callers passing limit > 200 keep working with a smaller response. Description on the Query field documents the cap and rationale. 2. Add tests/test_graph_endpoint_clamp.py: - test_graph_limit_above_cap_is_silently_clamped: assert request with limit=1000 returns 200 OK and response.limit == 200, nodes <= 200. - test_graph_limit_below_cap_passes_through: assert request with limit=50 returns 200 OK and response.limit == 50 (no modification).
|
Closing this. After reviewing, the server-side clamp masks the symptom but creates a new one: the Control Plane's The 596 MiB payload isn't really driven by node count either. Skimming
Better path, in priority order:
Going to start with (1) on a fresh branch, then profile a dense bank to confirm where the bytes actually go before chasing (2)-(4). |
Summary
Cap the
limitparameter onGET /v1/default/banks/{bank_id}/graphserver-side at 200. Edge count scales quadratically with node count for densely-linked banks, and the Control Plane (Next.js) deserializes the response as a single JS string which V8 caps at 0x1fffffe8 (~512 MiB).Repro
On a 14k-memory bank with ~835k memory_links and 17k entities:
limitThe Control Plane's
useState(1000)indata-view.tsxsets the initial fetchLimit, which falls within the API's oldlimit: int = 1000signature. Smaller banks fit; banks past ~10k highly-linked memories silently break the UI.What changes
A single
min(max(...))clamp inside theapi_graphhandler. Smaller queries pass through unchanged; large ones clamp to 200 with no error response. UI continues to function on dense banks instead of erroring out entirely.Why 200
Why server-side
The UI fix would be a one-line change to
data-view.tsx'suseState(1000)→useState(200). Either works; server-side caps protect SDK and direct API consumers too without depending on the UI staying in sync. Open to swapping for a UI-only fix if maintainers prefer.Real fix
This is a guard, not a solution. Long-term the graph endpoint should support pagination or streaming so high-density banks can be explored without truncation. Filing this as a stop-gap so the Control Plane stops 500'ing on production banks at scale.
Test plan
GraphDataResponseJSON.limit ≤ 200requests.pre-commithooks (ruff check --fix,ruff format,ty check) all green.